02A pairs experiments
Experiment-related scripts
Read pairwise competition outcomes that are determined by colony
counts on plates.
Read CASEU results
Map pairs and isolates to the 96-well plate layout
Read the pairwise competitive outcomes determined by colony counting
on TSA
The script below generates the table for manual keyin
if (FALSE) source("script/02A-pairs_experiment-01-pairwise_manual_key_in.R")
Then I manually checked the scanned plate images of competitive
pairs. The plate images are saved in folder
data/raw/plate_scan/emergent_coexistence_plate_scan/
divided according to the experimental batches.
| B2 |
2.6, 2.8, 7.1, 8.4, 10.2, 11.1 |
| C |
11.1 isolate 1 |
| C2 |
11.2 |
| D |
1.2, 1.4, 1.6, 1.7, 4.1, 11.5 |
# Read result colony counts and dilution factor
if (run_scripts) source("script/02A-pairs_experiment-02-colony_count.R")
There are 186x3=558 outcomes of pairwise competitions.
pairs_competition <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_competition.csv", col_types = cols())
pairs_competition
67 of of 558 pair-frequency without determined outcomes of pairwise
competitions will be determined by using CASEU method
pairs_competition %>% filter(is.na(ColonyCount))
186 pair IDs saved in data/temp/pairs_ID.csv
pairs_ID <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_ID.csv", col_types = cols())
pairs_ID
Ambiguous pairs and isolates on TSA agar plates
if (run_scripts) source("script/02A-pairs_experiment-03-pairwise_ambiguous.R")
There are 67 pair-freqs that the competition outcome cannot be
determined by TSA plate counting because of ambiguous morphology. The
ambiguous pairs will be later examined by using selective media or
Sanger sequencing.
pairs_ambiguous <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_ambiguous.csv", col_types = cols())
pairs_ambiguous
In total, 28 pairs
pairs_ambiguous %>%
group_by(Community, Isolate1, Isolate2) %>%
summarize(Count = n(), .groups = "keep")
36 isolates involved in the ambiguous pairs.
isolates_ambiguous <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_ambiguous.csv", col_types = cols())
isolates_ambiguous
Map pairs and isolates to the DW96 plate layout
if (run_scripts) source("script/02A-pairs_experiment-04-pairwise_plate_layout.R")
Each plate has 96 rows and has the following variables
plates <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/plates.csv", col_types = cols())
plates
Plates for across-community and random assembly
plates_random <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/plates_random.csv", col_types = cols())
plates_random
plates_list <- plates %>%
filter(Plate == "P1") %>%
distinct(Batch, PlateLayout) %>%
rename(batch = Batch, platelayout = PlateLayout) %>%
rowwise() %>%
mutate(plate = filter(plates, Batch == batch, PlateLayout == platelayout, Plate == "P1") %>% list) %>%
mutate(plate = prepare_plate_draw(plate) %>% list) %>%
mutate(p_plate = draw_plate_from_df(plate) %>% list)
plot_grid(plotlist = plates_list$p_plate, labels = paste0(plates_list$batch, " ", plates_list$platelayout),
ncol = 2)
Warning: Removed 2 rows containing missing values (geom_text).

Plate layout that takes different initial frequencies:
The only exception is plate C11R1 in batch C, which only has one
plate.
OD <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/OD.csv", col_types = cols())
OD %>%
filter(Community == "C1R2", Wavelength == 620) %>%
ggplot(aes(x = Transfer, y = Abs, color = Isolate1Freq)) +
geom_line() + geom_point() +
facet_grid(Isolate1 ~ Isolate2) +
theme_classic()

NA
02B CASEU sanger sequencing
To determine the relative abundance of of ambiguous pairs in
competitive assays.
Use CASEU packages to analyse the Sanger sequencing of mixture
culture.
Sanger sequencing protocol preparation. Experimental protocol
files are saved in folder output/protocol/
list.files(here::here("output/protocol"), pattern = "caseu")
Once we got the mixture Sanger sequences back, we implement
analysis by using CASEU (Compositional analysis by Sanger
electropherogram unmixing), an R package designed for quantifying
relative abundance of Sanger sequences mixture. source code. Install
CASEU from bitbucket.
Check out package
vignette.
devtools::install_bitbucket('dattamanoshi/caseu') # Install CASEU package
library(CASEU)
Test on example data
if (run_scripts) source("script/02B-CASEU_sanger_seq-00-test.R")
CASEU pilot1
The plate layout of PCR plate and list of samples are specified in
output/protocol/protocol_20190813_Sanger_seq_prep.pdf.
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot1/protocol_20190813_Sanger_seq_prep-genewiz_table.csv", col_types = cols())
The isolates used in this round is from the list below.
isolates
| C11R1 isolate 1 |
A |
Pseudomonas |
| C1R7 isolate 2 |
B |
Pseudomonas |
| C1R7 isolate 1 |
C |
Enterobacter |
| C1R7 isolate 7 |
D |
Raoultella |
Read trace matrices for isolates and mixtures
if (run_scripts) source("script/02B-CASEU_sanger_seq-01-pilot1.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot1.csv", col_types = cols())
CASEU pilot2
The plate layout of PCR plate and list of samples are specified in
output/protocol/protocol_20190813_Sanger_seq_prep.pdf.
There are 32 samples from pairwise competition and 16 samples from
control. Also read Sylvie’s data.
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot2/protocol_20190910_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())
Isolates A B C D in control are the isolates below.
| C11R1 isolate 1 |
A |
Pseudomonas |
| C1R7 isolate 2 |
B |
Pseudomonas |
| C1R7 isolate 1 |
C |
Enterobacter |
| C1R7 isolate 7 |
D |
Raoultella |
if (run_scripts) source("script/02B-CASEU_sanger_seq-02-pilot2.R")
#source("script/02B-CASEU_sanger_seq-02a-pilot2_Sylvie.R") # Run Sylvie's sequence
In the 12 control synthetic pairs that were made of 4 isolates,
compare these pairs’ OD frequencies, colony counts, and CASEU
predictions.
Read CASEU predicted frequencies and colony count frequencies in the
12 control pairs.
CASEU_pilot2 <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot2.csv", col_types = cols())
CASEU_pilot2_plating <- read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot2/CASEU_pilot2_plating.csv", col_types = cols()) %>%
mutate(Isolate1ColonyFreq = Isolate1Colony / (Isolate1Colony + Isolate2Colony),
Isolate2ColonyFreq = Isolate2Colony / (Isolate1Colony + Isolate2Colony))
CASEU_pilot2
CASEU pilot3
The plate layout of PCR plate and list of samples are specified in
output/protocol/protocol_20190923_SequalPrep_Sanger_prep.pdf.
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot3/protocol_20190924_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())
Read trace matrices for isolates and mixtures
# It takes about ~15 mins to run
if (run_scripts) source("script/02B-CASEU_sanger_seq-03-pilot3.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot3.csv", col_types = cols())
CASEU pilot4
The plate layout of PCR plate and list of samples are specified in
output/protocol/protocol_20191007_Sanger_seq_prep.pdf.
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot4/protocol_20191007_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())
Read trace matrices for isolates and mixtures
# It takes about ~15 mins to run
if (run_scripts) source("script/02B-CASEU_sanger_seq-04-pilot4.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot4.csv", col_types = cols())
CASEU six plates
- Genewiz submission code: CASEU_RN_5 (which is not correct because
these are not RN plates)
- Plates:
- B2 T7 933 P2
- B2 T7 444 P2
- C2 T7 13A P2
- C2 T7 13B P2
- D T7 75 P2
- D T7 5543 P2
- Notes: the plates must be placed in order!!
# Generate sequencing csv
if (FALSE) {
genewiz_table <- tibble(Sample = 1:(96*6),
`DNA Name` = paste0(rep(c("B2_T7_933_P2", "B2_T7_444_P2", "C2_T7_13A_P2", "C2_T7_13B_P2", "D_T7_75_P2", "D_T7_5543_P2"), each = 96), "_", rep(1:96, 6)),
`Length (bp)` = 1000, `Concentration (ng/uL)` = 0.83, Primer = "27F")
write_csv(genewiz_table, here::here("output/protocol/tab_fig/20211202_Sanger_seq_prep-genewiz_table_CYC.csv"))
}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_sixplates.csv", col_types = cols())
Random network, CASEU batch 1
- Genewiz submisstion code: Caseu_RN_1
- Plate layout: T3 C P2
- Description: samples 3, 4, 5, 12, 13, 27, 29, 33, 73, 85, 86, 91,
and 94 (order by column in the plate layout) arrived Genewiz dry, so
they are not processed.
# If will take ~2 hr to run
if (run_scripts) source("script/02B-random_network_CASEU-01-batch1.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN1.csv", col_types = cols())
Random network, CASEU batch 2
- Genewiz submission code: Caseu_RN_2
- Plate layout: T3 C P2
- Description: this plate layout contains the isolates for all 4
random assembled communities
# It takes ~30 mins
if (run_scripts) source("script/02B-random_network_CASEU-02-batch2.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN2.csv", col_types = cols())
Random network, CASEU batch 3
- Genewiz submisstion code: Caseu_RN_3
- Plate layout:
- P1 (T0 C P2), tracking number 30-333649143
- P2 (T0 AD P2), tracking number 30-333649365
- P3 (T3 AD P2), tracking number 30-333649517
- Description: AD plates have the isolates assembled from the
self-assembled communities but different communities
# It takes ~1.5 hr
if (run_scripts) source("script/02B-random_network_CASEU-03-batch3.R")
Use T0 OD frequencies
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN3.csv", col_types = cols())
Random network, CASEU batch 4
- Genewiz submisstion code: Caseu_RN_4
- Plate layout: T3 BD P3
- Description: This plate has the full AcrAss2 (B) and half of RanAss2
(D) network.
if (run_scripts) source("script/02B-random_network_CASEU-04-batch4.R")
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN4.csv", col_types = cols())
02C pairs_OD_CFU
Error propagation theory
Derive isolates’ \(\epsilon\)
from T8 data and calculate uncertatinty using error propagation
theory
Convert T0 OD frequency \(f^O\)
to CFU frequency \(f^C\) and estimate
the uncertainty in converted CFU frequencies at T8. Output
data/temp/pairs_CFU_freq_uncertainty.csv
Uncertainty in epsilon
if (run_scripts) source("script/02C-pairs_OD_CFU-01-epsilon_uncertainty.R")
The uncertainties in each isolate’ epsilon \(\epsilon_A = \frac{OD_A DF_A v_A}{CFU_A}\)
comes from four parts:
- \(DF\): Dilution factors. The
uncertainty comes from the pipetting in serial dilution, which is
calcaulted below.
- \(v\): Plating volumes. The
systematic error for P20 set at 20 uL is 0.4 uL.
- \(CFU\): CFU counts. The
uncertainty in CFU follows poisson distribution, which means that the
variance is the same as the mean (measured CFU). The standard deviation
is \(\sqrt{CFU}\).
- \(OD\): OD measurement. The
uncertainty in measuring OD in plate reader, which is assumed to be
0.001.
Dilution factor
The uncertainty in dilution factor comes from the pipetting steps in
serial dilution, which include two pipetting volumes:
V1: serial dispensing 10 uL of diluted solution using
mP20. It has uncertainty ErrorV1 2 uL.
V2: dispense 90 uL of PBS using mP200. It has
uncertainty ErrorV2 0.4 uL.
For dilution factor \(10^{n}\), it
has the the mean \((\frac{V_1}{V_1+V_2})^n\). To obtain the
uncertainty in the measured mean, first we caluculate the partial
derivatives \(\frac{\partial DF}{\partial
V_1}\) and \(\frac{\partial
DF}{\partial V_2}\). Then the uncertainty \(\delta DF = \sqrt{(\frac{\partial DF}{\partial
V_1}\delta V_1)^2 + (\frac{\partial DF}{\partial V_2}\delta
V_2)^2}\)
dilution_factor <- tibble(n = c(4, 5), V1 = 10, V2 = 90, ErrorV1 = 0.4, ErrorV2 = 2) %>%
mutate(PartialV1 = n * V1^(n-1) * V2 / (V1+V2)^(n+1), #-n*(V1+V2)^(n-1)*V2/(V1^(n+1)),
PartialV2 = -n * V1^(n-1) / (V1+V2)^(n+1), #n*(V1+V2)^(n-1)/(V1^n),
DF = (V1/(V1+V2))^n,
ErrorDF = sqrt((PartialV1*ErrorV1)^2 + (PartialV2*ErrorV2)^2))
dilution_factor
By using error propagation theroy, the uncertainty in isolates
epsilon has the form
\[\delta \epsilon =
\sqrt{(\frac{\partial \epsilon}{\partial DF}\delta DF)^2
+(\frac{\partial \epsilon}{\partial CFU}\delta CFU)^2 +
(\frac{\partial \epsilon}{\partial OD}\delta OD)^2 +
(\frac{\partial \epsilon}{\partial V}\delta V)^2}\]
isolates_epsilon_uncertainty <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_epsilon_uncertainty.csv", col_types = cols())
isolates_epsilon_uncertainty
NA
There are 7 isolates that have either 0 CFU or negative OD value, so
they don’t have epsilon values.
isolates_epsilon_uncertainty %>%
filter(is.na(Epsilon))
Convert T0 OD frequencies to CFU frequencies
if (run_scripts) source("script/02C-pairs_OD_CFU-02-CFU_frequency.R")
The outcome of pairwise competition were determined by comparing the
frequency changes between T0 and T8. The T8 frequencies were determined
by plating the mature media on rich agar media on which the colonies
were counted, whereas T0 frequencies were set to 95/5, 50/50 and 5/95 by
mixing two isolate inocula with equal OD. In this section, I will use
the OD-CFU conversion coefficient \(\epsilon\) derived from T8 isolate data to
convert T0 OD frequencies into CFU frequencies. In specific, the CFU
frequency of isolate 1 \(f^C_1\) of a
pair can be derived from OD frequencies of isolate 1 and 2 \(f^O_1 ,f^O_2\), which have the relationship
below.
\(f^C_1 =
\frac{f^o_1\epsilon_1DFv}{f^o_1\epsilon_1DFv +
f^o_2\epsilon_2DFv}=\frac{f^o_1\epsilon_1}{f^o_1\epsilon_1 +
f^o_2\epsilon_2}\)
where \(\epsilon\) of each isolate
was pre-calculated from T8 dataset. DF and v are the same in
conversion.
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_CFU_freq.csv", col_types = cols())
Uncertainty in T0 CFU frequencies
if (run_scripts) source("script/02C-pairs_OD_CFU-03-CFU_frequency_uncertainty.R")
Write CFU frequencies to
data/temp/pairs_CFU_freq_uncertainty.csv
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_CFU_freq_uncertainty.csv", col_types = cols())
02D determine pairwise interaction
Combine outcomes of pairwise competition from CFU counts and
CASEU
Raw data (e.g., CFU counts and CASEU Sanger sequences) are processed
and generated into temporary result csv:
02B-CASEU_sanger_seq reads CASEU raw data and
outputs temp/CASEU_pilot2.csv and
temp/CASEU_pilot3.csv. Both files are CASEU predicted T8
frequencies.
02C-pairs_OD_CFU reads pair_competition and dilution
factor data, and it outputs
temp/pairs_CFU_freq_uncertainty.csv, which has the T0
OD-converted CFU frequencies and T8 CFU frequencies with
uncertainties.
if (run_scripts) source("script/02D-determine_pairwise_interaction-01-combine_CFU_CASEU_result.R")
The script in this section returns a data.frame
pairs_freq that has the following variables:
Community
Isolate1 and Isolate2: indices of isolates
within a community. The number of isolate1 is always smaller than
isolate2
Isolate1InitialODFreq and
Isolate2InitialODFreq: 5, 50 or 95. The initial OD
frequencies of isolates at T0. This two serve as discrete grouping
variables.
Time: T0 or T8.
Isolate1MeasuredFreq: the measured frequency od
isolate1 in the pair.
ErrorIsolate1MeasuredFreq: the uncertainty of
Isolate1MeasuredFreq.
RawDataType: OD, ODtoCFU, CFU, Sanger (CASEU). The raw
data type in which the isolate frequencies were measured.
Contamination: logical. There are contamination in
three pairs at T8 plates.
186x3x2=1116 pair-freq at two time points for the self-assembled
community networks.
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_freq.csv", col_types = cols())
Determine pairwise interactions
if (run_scripts) source("script/02D-determine_pairwise_interaction-02-determine_pairwise_interaction.R")
Table of all 27 + 4 possible combinations of fitness function and
their interaction types
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_interaction_table.csv", col_types = cols())
Table of interaction types
pairs_interaction_fitness <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_interaction_fitness.csv", col_types = cols())
pairs_interaction_fitness %>%
group_by(Set, InteractionType) %>%
summarize(Count = n(), .groups = "keep")
Isolate tournament
if (run_scripts) source("script/02D-determine_pairwise_interaction-03-isolate_tournament.R")
Tournament ranks of each isolate. Note that I consider neturality and
bistability as draw in the tournament.
Score: the competitive scores of isolate. This score is
computed by the formula: number of wins - number of loses + 0 * number
of draws.
Game: number of pairwise competition the isolate has
played. The number of games an isolate plated within a community should
be community size minus one.
Rank: the ranks based on Score. The ranks
range from 1 to the focal community size. Isolates with the same scores
in a community are given the same rank.
PlotRank: continuous rank for plotting
convenience.
Summary
Read and combine pairs data
if (run_scripts) source("script/02-pairs-01-read_data.R")
186 pairs of self-assembled communities and 112 pair from
across-community and random networks
Pairs with three initial frequencies and three. 186x3x2=1116
pairs_meta contains the growth traits of isolates. Note that the
order of Isolate1 and Isolate2 in some pairs are flipped such that
Isolate1 is always the dominant strain and Isolate 2 is the subdomaint
one. Dominant strain is the winner in exclusion pairs, and the more
abundant strain (50:50 treatment) in coexistence pairs.
if (FALSE) pairs_meta <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_meta.csv", col_types = cols())
Example outcome types
if (run_scripts) source("script/02-pairs-02-outcome_types.R")
Coarse-grained
Fine-grained
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_example_outcomes_finer.csv", col_types = cols())
---
title: "Analysis on pairwise competition assays"
author: "Chang-Yu Chang"
date: "`r Sys.Date()`"
output:
  html_notebook:
    toc: yes
    number_sections: no
---

```{r setup, include = F}
knitr::opts_chunk$set(cache = FALSE, echo = TRUE)
library(tidyverse)
library(cowplot)
source(here::here("plotting_scripts/misc.R"))
run_scripts <- F
```


# 02A pairs experiments

- Experiment-related scripts

- Read pairwise competition outcomes that are determined by colony counts on plates. 

- Read CASEU results

- Map pairs and isolates to the 96-well plate layout


## Read the pairwise competitive outcomes determined by colony counting on TSA

The script below generates the table for manual keyin

```{r}
if (FALSE) source("script/02A-pairs_experiment-01-pairwise_manual_key_in.R")
```

Then I manually checked the scanned plate images of competitive pairs. The plate images are saved in folder `data/raw/plate_scan/emergent_coexistence_plate_scan/` divided according to the experimental batches.

Batch | Community
------|-----------
B2    | 2.6, 2.8, 7.1, 8.4, 10.2, 11.1
C     | 11.1 isolate 1
C2    | 11.2
D     | 1.2, 1.4, 1.6, 1.7, 4.1, 11.5


``` {r}
# Read result colony counts and dilution factor
if (run_scripts) source("script/02A-pairs_experiment-02-colony_count.R")
```

There are 186x3=558 outcomes of pairwise competitions.

```{r}
pairs_competition <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_competition.csv", col_types = cols())
pairs_competition
```

67 of of 558 pair-frequency without determined outcomes of pairwise competitions will be determined by using CASEU method

```{r}
pairs_competition %>% filter(is.na(ColonyCount))
```

186 pair IDs saved in `data/temp/pairs_ID.csv`

```{r}
pairs_ID <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_ID.csv", col_types = cols())
pairs_ID
```



## Ambiguous pairs and isolates on TSA agar plates

```{r}
if (run_scripts) source("script/02A-pairs_experiment-03-pairwise_ambiguous.R")
```


There are 67 pair-freqs that the competition outcome cannot be determined by TSA plate counting because of ambiguous morphology. The ambiguous pairs will be later examined by using selective media or Sanger sequencing. 

```{r}
pairs_ambiguous <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_ambiguous.csv", col_types = cols())
pairs_ambiguous
```

In total, 28 pairs

```{r}
pairs_ambiguous %>% 
  group_by(Community, Isolate1, Isolate2) %>%
  summarize(Count = n(), .groups = "keep")
```

36 isolates involved in the ambiguous pairs. 

```{r}
isolates_ambiguous <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_ambiguous.csv", col_types = cols())
isolates_ambiguous
```



## Map pairs and isolates to the DW96 plate layout

```{r}
if (run_scripts) source("script/02A-pairs_experiment-04-pairwise_plate_layout.R")
```

Each plate has 96 rows and has the following variables

```{r}
plates <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/plates.csv", col_types = cols())
plates
```


Plates for across-community and random assembly

```{r}
plates_random <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/plates_random.csv", col_types = cols())
plates_random
```



```{r fig.height = 4, fig.width = 3}
plates_list <- plates %>%
    filter(Plate == "P1") %>%
    distinct(Batch, PlateLayout) %>%
    rename(batch = Batch, platelayout = PlateLayout) %>%
    rowwise() %>%
    mutate(plate = filter(plates, Batch == batch, PlateLayout == platelayout, Plate == "P1") %>% list) %>%
    mutate(plate = prepare_plate_draw(plate) %>% list) %>%
    mutate(p_plate = draw_plate_from_df(plate) %>% list) 

plot_grid(plotlist = plates_list$p_plate, labels = paste0(plates_list$batch, " ", plates_list$platelayout),
          ncol = 2)

```

Plate layout that takes different initial frequencies:

- P1 is 50%:50%.

- P2 and P3 are identical and whose rows are 95% and columns are 5%.

The only exception is plate C11R1 in batch C, which only has one plate.

```{r}
OD <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/OD.csv", col_types = cols())

OD %>%
    filter(Community == "C1R2", Wavelength == 620) %>%
    ggplot(aes(x = Transfer, y = Abs, color = Isolate1Freq)) +
    geom_line() + geom_point() +
    facet_grid(Isolate1 ~ Isolate2) +
    theme_classic()
    
```



# 02B CASEU sanger sequencing

- To determine the relative abundance of of ambiguous pairs in competitive assays. 

- Use CASEU packages to analyse the Sanger sequencing of mixture culture.

- Sanger sequencing protocol preparation. Experimental protocol files are saved in folder `output/protocol/`

```{r}
list.files(here::here("output/protocol"), pattern = "caseu")
```

- Once we got the mixture Sanger sequences back, we implement analysis by using `CASEU` (Compositional analysis by Sanger electropherogram unmixing), an R package designed for quantifying relative abundance of Sanger sequences mixture. [source code](https://bitbucket.org/DattaManoshi/caseu). Install `CASEU` from bitbucket.

- Check out [package vignette](https://htmlpreview.github.io/?https://bitbucket.org/dattamanoshi/caseu/raw/master/doc/CASEU_Vignette.html).

```{r, echo = TRUE, eval = FALSE}
devtools::install_bitbucket('dattamanoshi/caseu') # Install CASEU package
library(CASEU)
```


## Test on example data

```{r}
if (run_scripts) source("script/02B-CASEU_sanger_seq-00-test.R")
```


## CASEU pilot1

The plate layout of PCR plate and list of samples are specified in `output/protocol/protocol_20190813_Sanger_seq_prep.pdf`. 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot1/protocol_20190813_Sanger_seq_prep-genewiz_table.csv", col_types = cols())
```

The isolates used in this round is from the list below.

Isolate         |  Code     |  Taxa
----------------|-----------|-------
C11R1 isolate 1 | A         | Pseudomonas
C1R7 isolate 2  | B         | Pseudomonas
C1R7 isolate 1  | C         | Enterobacter
C1R7 isolate 7  | D         | Raoultella
Table: isolates

Read trace matrices for isolates and mixtures

```{r}
if (run_scripts) source("script/02B-CASEU_sanger_seq-01-pilot1.R")
```


```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot1.csv", col_types = cols())
```


## CASEU pilot2

The plate layout of PCR plate and list of samples are specified in `output/protocol/protocol_20190813_Sanger_seq_prep.pdf`. There are 32 samples from pairwise competition and 16 samples from control. Also read Sylvie's data. 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot2/protocol_20190910_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())
```

Isolates A B C D in control are the isolates below. 

Isolate         |  Code     |  Taxa
----------------|-----------|-------
C11R1 isolate 1 | A         | Pseudomonas
C1R7 isolate 2  | B         | Pseudomonas
C1R7 isolate 1  | C         | Enterobacter
C1R7 isolate 7  | D         | Raoultella

```{r}
if (run_scripts) source("script/02B-CASEU_sanger_seq-02-pilot2.R")
#source("script/02B-CASEU_sanger_seq-02a-pilot2_Sylvie.R") # Run Sylvie's sequence
```


In the 12 control synthetic pairs that were made of 4 isolates, compare these pairs' OD frequencies, colony counts, and CASEU predictions.

Read CASEU predicted frequencies and colony count frequencies in the 12 control pairs.

```{r}
CASEU_pilot2 <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot2.csv", col_types = cols())
CASEU_pilot2_plating <- read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot2/CASEU_pilot2_plating.csv", col_types = cols()) %>%
  mutate(Isolate1ColonyFreq = Isolate1Colony / (Isolate1Colony + Isolate2Colony),
    Isolate2ColonyFreq = Isolate2Colony / (Isolate1Colony + Isolate2Colony))
CASEU_pilot2
```



## CASEU pilot3

The plate layout of PCR plate and list of samples are specified in `output/protocol/protocol_20190923_SequalPrep_Sanger_prep.pdf`. 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot3/protocol_20190924_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())
```

Read trace matrices for isolates and mixtures

```{r}
# It takes about ~15 mins to run 
if (run_scripts) source("script/02B-CASEU_sanger_seq-03-pilot3.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot3.csv", col_types = cols())
```


## CASEU pilot4

The plate layout of PCR plate and list of samples are specified in `output/protocol/protocol_20191007_Sanger_seq_prep.pdf`. 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/raw/Sanger/CASEU_pilot4/protocol_20191007_Sanger_seq_prep-genewiz_table_CYC.csv", col_types = cols())
```

Read trace matrices for isolates and mixtures

```{r}
# It takes about ~15 mins to run
if (run_scripts) source("script/02B-CASEU_sanger_seq-04-pilot4.R")
```


```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_pilot4.csv", col_types = cols())
```


## CASEU six plates

- Genewiz submission code: CASEU_RN_5 (which is not correct because these are not RN plates)
- Plates:
    - B2 T7 933 P2
    - B2 T7 444 P2
    - C2 T7 13A P2
    - C2 T7 13B P2
    - D T7 75 P2
    - D T7 5543 P2
- Notes: the plates must be placed in order!!

```{r}
# Generate sequencing csv
if (FALSE) {
genewiz_table <- tibble(Sample = 1:(96*6), 
       `DNA Name` = paste0(rep(c("B2_T7_933_P2", "B2_T7_444_P2", "C2_T7_13A_P2", "C2_T7_13B_P2",  "D_T7_75_P2", "D_T7_5543_P2"), each = 96), "_", rep(1:96, 6)), 
       `Length (bp)` = 1000, `Concentration (ng/uL)` = 0.83, Primer = "27F")

write_csv(genewiz_table, here::here("output/protocol/tab_fig/20211202_Sanger_seq_prep-genewiz_table_CYC.csv"))

}
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_sixplates.csv", col_types = cols())
```



## Random network, CASEU batch 1

- Genewiz submisstion code: Caseu_RN_1
- Plate layout: T3 C P2
- Description: samples 3, 4, 5, 12, 13, 27, 29, 33, 73, 85, 86, 91, and 94 (order by column in the plate layout) arrived Genewiz dry, so they are not processed.

```{r}
# If will take ~2 hr to run
if (run_scripts) source("script/02B-random_network_CASEU-01-batch1.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN1.csv", col_types = cols())
```



## Random network, CASEU batch 2

- Genewiz submission code: Caseu_RN_2
- Plate layout: T3 C P2
- Description: this plate layout contains the isolates for all 4 random assembled communities 

```{r}
# It takes ~30 mins
if (run_scripts) source("script/02B-random_network_CASEU-02-batch2.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN2.csv", col_types = cols())
```


## Random network, CASEU batch 3

- Genewiz submisstion code: Caseu_RN_3
- Plate layout: 
    - P1 (T0 C P2), tracking number 30-333649143
    - P2 (T0 AD P2), tracking number 30-333649365
    - P3 (T3 AD P2), tracking number 30-333649517
- Description: AD plates have the isolates assembled from the self-assembled communities but different communities

```{r}
# It takes ~1.5 hr
if (run_scripts) source("script/02B-random_network_CASEU-03-batch3.R")
```

Use T0 OD frequencies

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN3.csv", col_types = cols())
```



## Random network, CASEU batch 4

- Genewiz submisstion code: Caseu_RN_4
- Plate layout: T3 BD P3
- Description: This plate has the full AcrAss2 (B) and half of RanAss2 (D) network.


```{r}
if (run_scripts) source("script/02B-random_network_CASEU-04-batch4.R")
```


```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/CASEU_RN4.csv", col_types = cols())
```




# 02C pairs_OD_CFU

- Error propagation theory

- Derive isolates' $\epsilon$ from T8 data and calculate uncertatinty using error propagation theory

- Convert T0 OD frequency $f^O$ to CFU frequency $f^C$ and estimate the uncertainty in converted CFU frequencies at T8. Output `data/temp/pairs_CFU_freq_uncertainty.csv`

## General formula of error propagation

This section explains the error propagation theory to estimate the uncertainty in the experimental measurement. There are one or more quantities $x, y, ...$, with corresponding uncertainties $\delta x, \delta y, ...$ and that we wish to use the measured values of x and y to calculate the quantity of real interest q. There are three provisional rules:

1. The square-root rule for counting experiments. The average number of event in time T is $v\pm\sqrt{v}$

2. Uncertainty in **sums and differences**. The computed mean value $q=x+y+z-(u+w)$. If the uncertainties in x, ..., w are known to be independent and random, then the uncertainty in the computed value of q is the quadratic sum $\delta q = \sqrt{(\delta x)^2+(\delta y)^2+(\delta z)^2+(\delta u)^2+(\delta w)^2}$ , In any case, $\delta q$ is never larger than their ordinary sum $\delta q \leqslant \delta x + \delta y+ \delta z + \delta u + \delta w$

3. Uncertainty in **product and quotients**. $q=\frac{x\times z}{u\times w}$, then the fractional uncertainty in the computed value q is the sum. If the uncertainties in x, ...,w are independent and random, then the fractional uncertainty in q is the sum in quadrature of the original uncertainties, $\frac{\delta q}{\left|q\right|} = \sqrt{(\frac{\delta x}{\left|x\right|})^2 + \frac{\delta y}{\left|y\right|})^2 +\frac{\delta z}{\left|z\right|})^2 +\frac{\delta u}{\left|u\right|})^2 + \frac{\delta w}{\left|w\right|})^2}$. $\frac{\delta q}{\left| q \right|} \leqslant \frac{\delta x}{\left| x \right|} + \frac{\delta z}{\left| z \right|} + \frac{\delta u}{\left| u \right|} + \frac{\delta w}{\left| w \right|}$ 

Taking these three provisional rules together, the general formula for error propagation takes the following form. Assume the computed quantity is a function of $x_1, x_2, ..., x_n$, the uncertainty in q is $$\delta q= \sqrt{(\sum{\frac{\partial q}{\partial x_i}}\delta x_i)^2}$$ 



## Uncertainty in epsilon 

```{r}
if (run_scripts) source("script/02C-pairs_OD_CFU-01-epsilon_uncertainty.R")
```


The uncertainties in each isolate' epsilon $\epsilon_A = \frac{OD_A DF_A v_A}{CFU_A}$ comes from four parts:

1. $DF$: Dilution factors. The uncertainty comes from the pipetting in serial dilution, which is calcaulted below.
2. $v$: Plating volumes. The systematic error for P20 set at 20 uL is 0.4 uL.
3. $CFU$: CFU counts. The uncertainty in CFU follows poisson distribution, which means that the variance is the same as the mean (measured CFU). The standard deviation is $\sqrt{CFU}$.
4. $OD$: OD measurement. The uncertainty in measuring OD in plate reader, which is assumed to be 0.001.


**Dilution factor**

The uncertainty in dilution factor comes from the pipetting steps in serial dilution, which include two pipetting volumes:

1. `V1`: serial dispensing 10 uL of diluted solution using mP20. It has uncertainty `ErrorV1` 2 uL.
2. `V2`: dispense 90 uL of PBS using mP200. It has uncertainty `ErrorV2` 0.4 uL.

For dilution factor $10^{n}$, it has the the mean $(\frac{V_1}{V_1+V_2})^n$. To obtain the uncertainty in the measured mean, first we caluculate the partial derivatives $\frac{\partial DF}{\partial V_1}$ and $\frac{\partial DF}{\partial V_2}$. Then the uncertainty $\delta DF = \sqrt{(\frac{\partial DF}{\partial V_1}\delta V_1)^2 + (\frac{\partial DF}{\partial V_2}\delta V_2)^2}$


```{r}
dilution_factor <- tibble(n = c(4, 5), V1 = 10, V2 = 90, ErrorV1 = 0.4, ErrorV2 = 2) %>%
  mutate(PartialV1 = n * V1^(n-1) * V2 / (V1+V2)^(n+1), #-n*(V1+V2)^(n-1)*V2/(V1^(n+1)),
         PartialV2 = -n * V1^(n-1) / (V1+V2)^(n+1), #n*(V1+V2)^(n-1)/(V1^n),
         DF = (V1/(V1+V2))^n,
         ErrorDF = sqrt((PartialV1*ErrorV1)^2 + (PartialV2*ErrorV2)^2))

dilution_factor
```


By using error propagation theroy, the uncertainty in isolates epsilon has the form 

$$\delta \epsilon = \sqrt{(\frac{\partial  \epsilon}{\partial DF}\delta DF)^2 +(\frac{\partial  \epsilon}{\partial CFU}\delta CFU)^2 + (\frac{\partial  \epsilon}{\partial OD}\delta OD)^2 + (\frac{\partial  \epsilon}{\partial V}\delta V)^2}$$


```{r}
isolates_epsilon_uncertainty <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_epsilon_uncertainty.csv", col_types = cols())
isolates_epsilon_uncertainty
```

There are 7 isolates that have either 0 CFU or negative OD value, so they don't have epsilon values.

```{r}
isolates_epsilon_uncertainty %>%
  filter(is.na(Epsilon))
```

## Convert T0 OD frequencies to CFU frequencies

```{r}
if (run_scripts) source("script/02C-pairs_OD_CFU-02-CFU_frequency.R")
```

The outcome of pairwise competition were determined by comparing the frequency changes between T0 and T8. The T8 frequencies were determined by plating the mature media on rich agar media on which the colonies were counted, whereas T0 frequencies were set to 95/5, 50/50 and 5/95 by mixing two isolate inocula with equal OD.  In this section, I will use the OD-CFU conversion coefficient $\epsilon$ derived from T8 isolate data to convert T0 OD frequencies into CFU frequencies. In specific, the CFU frequency of isolate 1 $f^C_1$ of a pair can be derived from OD frequencies of isolate 1 and 2 $f^O_1 ,f^O_2$, which have the relationship below. 

$f^C_1 = \frac{f^o_1\epsilon_1DFv}{f^o_1\epsilon_1DFv + f^o_2\epsilon_2DFv}=\frac{f^o_1\epsilon_1}{f^o_1\epsilon_1 + f^o_2\epsilon_2}$

where $\epsilon$ of each isolate was pre-calculated from T8 dataset. DF and v are the same in conversion.

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_CFU_freq.csv", col_types = cols())
```


## Uncertainty in T0 CFU frequencies

```{r}
if (run_scripts) source("script/02C-pairs_OD_CFU-03-CFU_frequency_uncertainty.R")
```

Write CFU frequencies to `data/temp/pairs_CFU_freq_uncertainty.csv`

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_CFU_freq_uncertainty.csv", col_types = cols())
```


# 02D determine pairwise interaction

## Combine outcomes of pairwise competition from CFU counts and CASEU

Raw data (e.g., CFU counts and CASEU Sanger sequences) are processed and generated into temporary result csv:

1. `02B-CASEU_sanger_seq` reads CASEU raw data and outputs `temp/CASEU_pilot2.csv` and `temp/CASEU_pilot3.csv`. Both files are CASEU predicted T8 frequencies.

2. `02C-pairs_OD_CFU` reads pair_competition and dilution factor data, and it outputs `temp/pairs_CFU_freq_uncertainty.csv`, which has the T0 OD-converted CFU frequencies and T8 CFU frequencies with uncertainties.


```{r}
if (run_scripts) source("script/02D-determine_pairwise_interaction-01-combine_CFU_CASEU_result.R")
```

The script in this section returns a data.frame `pairs_freq` that has the following variables:

- `Community`
- `Isolate1` and `Isolate2`: indices of isolates within a community. The number of isolate1 is always smaller than isolate2
- `Isolate1InitialODFreq` and `Isolate2InitialODFreq`: 5, 50 or 95. The initial OD frequencies of isolates at T0. This two serve as discrete grouping variables.
- `Time`: T0 or T8.
- `Isolate1MeasuredFreq`: the measured frequency od isolate1 in the pair. 
- `ErrorIsolate1MeasuredFreq`: the uncertainty of `Isolate1MeasuredFreq`.
- `RawDataType`: OD, ODtoCFU, CFU, Sanger (CASEU). The raw data type in which the isolate frequencies were measured.
- `Contamination`: logical. There are contamination in three pairs at T8 plates.

186x3x2=1116 pair-freq at two time points for the self-assembled community networks.


```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_freq.csv", col_types = cols())
```




## Determine pairwise interactions

```{r}
if (run_scripts) source("script/02D-determine_pairwise_interaction-02-determine_pairwise_interaction.R")
```

Table of all 27 + 4 possible combinations of fitness function and their interaction types 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_interaction_table.csv", col_types = cols())
```


Table of interaction types

```{r}
pairs_interaction_fitness <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_interaction_fitness.csv", col_types = cols())
pairs_interaction_fitness %>%
  group_by(Set, InteractionType) %>%
  summarize(Count = n(), .groups = "keep") 
```

## Isolate tournament


```{r}
if (run_scripts) source("script/02D-determine_pairwise_interaction-03-isolate_tournament.R")
```

Tournament ranks of each isolate. Note that I consider neturality and bistability as draw in the tournament.

- `Score`: the competitive scores of isolate. This score is computed by the formula: number of wins - number of loses + 0 * number of draws. 
- `Game`: number of pairwise competition the isolate has played. The number of games an isolate plated within a community should be community size minus one. 
- `Rank`: the ranks based on `Score`. The ranks range from 1 to the focal community size. Isolates with the same scores in a community are given the same rank.
- `PlotRank`: continuous rank for plotting convenience.


# 02E competition phylogeny

- Correlate competition result with phylogenetic distance

- Measure the pairwise phylogenetic distances by difference in 16S base pairs `pairs_taxonomy`
    - Coarse-grained taxonomy (Family and Genus)
    - Compute the pairwise sequence differences using `Biostring::pairwiseAlignment()` 
    - Compute pairwise tree distance using `ape::cophenetic.phylo()` on built tree. Try out different tree building methods


## Isolates' RDP taxonomy

```{r}
if (run_scripts) source("script/02E-competition_phylogeny-01-pairs_taxonomy.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_taxonomy.csv", col_types = cols())
```



## Isolates' 16S sequence difference


```{r}
if (run_scripts) source("script/02E-competition_phylogeny-02-pairs_16S.R")
```

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/pairs_16S.csv", col_types = cols())
```




# Summary

## Read and combine pairs data 

```{r}
if (run_scripts) source("script/02-pairs-01-read_data.R")
```

186 pairs of self-assembled communities and 112 pair from across-community and random networks

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs.csv", col_types = cols())
```

Pairs with three initial frequencies and three. 186x3x2=1116

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_freq.csv", col_types = cols()) %>%
    filter(Set == "CFUandCASEU")
```

pairs_meta contains the growth traits of isolates. Note that the order of Isolate1 and Isolate2 in some pairs are flipped such that Isolate1 is always the dominant strain and Isolate 2 is the subdomaint one. Dominant strain is the winner in exclusion pairs, and the more abundant strain (50:50 treatment) in coexistence pairs.

```{r}
if (FALSE) pairs_meta <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_meta.csv", col_types = cols())
```

Example outcome types

```{r}
if (run_scripts) source("script/02-pairs-02-outcome_types.R")
```

Coarse-grained 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_example_outcomes.csv", col_types = cols())
```
Fine-grained 

```{r}
read_csv("~/Dropbox/lab/emergent-coexistence/data/output/pairs_example_outcomes_finer.csv", col_types = cols())
```






